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1  Introduction 


This  report  documents  the  progress  to  date  on  the  Army  Research  Office/TRW  Application  Of 
Some  A'eu-  Artificial  Seural  System  Information  Processing  Principles  To  Pattern  Classification 
study.  An  important  milestone  has  been  reached  in  that  we  have  selected  the  pattern  environment 
and  have  developed  a  pre-processing  tool  which  generates  that  environment  for  experimentation. 

1.1  ARO  Study  Overview 

The  overall  goal  of  the  ARO  study  is  to  provide  a  firm  conceptual  and  theoretical  foundation  for 
future  work  in  ANS.  In  this  effort  we  are  exploring  the  use  of  three  information  processing  prin¬ 
ciples  (hierarchical  template  pattern  storage,  temporal  compression,  and  simultaneous  competitive 
template  matching)  and  developing  concise  mathematical  formulations  of  these  principles.  This 
will  be  accomplished  with  6  tasks: 

•  The  primary  purpose  of  Task  1  is  to  select  the  pattern  environment  which  supports  the 
experimentation  and  develop  the  necessary  pre-processing.  This  part  of  Task  1  has  been 
completed  and  is  documented  in  this  interim  report.  Additional  work  on  pre-processing 
techniques  such  as  the  Fourier-Mellin  transform  will  be  done  as  the  study  progresses. 

•  The  work  under  Task  2  will  use  the  signal  environment  to  carry  out  experiments  with  a 
multi-slab  ANS. 

•  In  Task  3,  data  reduction  and  analysis  will  be  conducted  using  the  experimental  data  gener¬ 
ated  in  Task  2. 

•  In  Task  4.  theoretical  principles  of  ANS  information  processing  will  be  developed,  using  the 
analysis  of  Task  3. 

•  Tasks  5  and  6  will  doemument  the  work  done  in  final  and  interm  reports. 

The  schedule  for  these  tasks  is  given  in  Figure  1. 


1.2  Signal  Environment  For  Experimental  Work 


The  pattern  environment  which  will  be  the  substrate  for  the  experimental  work  has  been  selected 
as  a  result  of  our  work  under  task  1.  We  have  decided  to  use  a  simple  version  of  spoken  English, 
w'hich  we  call  Letter-English  or  Lenglish.  A  description  of  Lenglish  is  given  in  section  2.  The 
initial  signal  input  to  the  experimentation  will  be  the  Lenglish  reading  of  selected  source  texts,  ^ 
described  in  section  3.  Section  4  of  this  report  outlines  our  experimental  plans  for  exploring  the  ^ 
three  processing  principles  using  this  pattern  environment. 
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2  Lenglish  Description 


The  basic  concept  of  Lenglish  is  to  associate  an  arbitrary  “sound”  spectrum  with  each  letter  in  the 
English  language.  Specifically,  each  letter  in  the  alphabet  is  assigned  one  vector  from  a  set  of  nearly 
orthogonal  32-dimensional  vectors.  The  zero  vector  is  used  to  represent  pauses  in  speech  between 
words.  Pauses  can  be  removed,  if  desired,  to  simulate  the  effect  of  word  merging.  To  simulate  the 
continuous  nature  of  spoken  language  there  is  a  smooth  transition  between  adjacent  letters  within  a 
word.  The  transition  vectors  are  generated  by  convex  combination  of  the  adjacent  letters’  vectors. 
.\11  of  the  vectors  are  normalized  to  “unit"  length  except  the  zero  vector  and  transition  vectors  to 
or  from  the  zero  vector. 

A  pre-processing  tool  generates  the  signal  input  to  the  ANS  by  translating  English  text  into 
Lenglish.  The  tool  first  finds  the  spectrum  associated  with  a  letter  in  the  text  stream.  This 
spectrum  is  normalized  and  then  repeated  or  “sampled*  a  user  specified  number  of  times.  The  tool 
theii  generates  a  user  specified  number  of  transition  vectors  to  the  spectrum  of  the  next  letter  in 
the  text.  Blanks  in  the  text  seperating  words  represent  pauses  in  speech  and  so  are  translated  into 
the  zero  vector. 

A  movie  has  been  made  which  shows  the  input  signal  generated  by  the  pre-processing  tool. 
Figure  2  shows  one  frame  from  the  movie.  The  text  rolls  accross  the  screen  from  right  to  left.  A 
fixed  arrow  indicates  the  current  letter  or  transition  being  processed.  The  window  below  the  text 
displays  the  spectrum  for  the  current  letter  or  transition. 

3  Text  Sources 

In  selecting  the  initial  source  text  to  be  translated  into  Lenglish,  we  wanted  a  source  with  a 
restricted  vocabulary,  relatively  short  words,  and  a  lot  of  repetition.  For  this  reason  we  decided 
to  use  the  text  of  childrens  books  to  define  the  language.  We  have  chosen  the  classic  “The  Little 
Engine  That  Could"  by  Watty  Piper  and  the  popular  “Frog  And  Toad  Are  Friends”  by  Arnold 
Lobel.  “The  Little  Engine  That  Could*  contains  1349  words  and  uses  a  vocabulary  of  302  words. 
The  letter  frequencies  are  given  in  Table  I.  Most  of  the  words  are  used  infrequently,  however  about 
10  percent  of  the  words  (24  words)  are  repeated  at  least  10  times  in  the  text.  The  vocabulary  set, 
with  associated  frequencies,  is  shown  in  Table  2.  The  second  book,  “Frog  And  Toad  Are  Friends”, 
contains  2702  words  and  uses  a  vocabularly  of  412  words.  Over  20  of  the  words  are  repeated  20 
times  or  more  in  the  book.  The  letter  and  word  frequencies  are  given  in  Tables  1  and  3. 

4  Experimental  Plans 

The  Lengish  signal  input  derived  from  the  source  texts  will  be  divided  into  training  sets  and 
test  sets.  These  sets  are  the  input  stimulus  to  a  multi-slab  ANS  such  as  the  design  sketched  in 
figure  3.  Our  approach  to  the  A.NS  experimentation  with  the  pattern  environment  uses  a  model 
of  unsupervized  learning  with  competition.  In  this  model,  a  competitive  mechanism  will  be  used 
to  discover  a  set  of  feature  detectors  for  the  input  stimulus.  This  is  essentially  the  competitive 
learning  of  Rumelharl  with  an  important  extension.  Rumelhart's  work  demonstrated  how  an 
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ANS  can  organize  itself  into  feature  detectors  for  spatial  patterns.  Our  work  under  this  study  will 
extend  these  results  to  the  case  of  time  varying  (spatio-temporal)  patterns.  During  experimentaion 
the  time-varying  signal  of  the  simulated  speech  will  be  presented  to  the  ANS.  When  the  system 
reaches  equilibrium,  slab  1  will  have  discovered  feature  detectors  for  pattern  primatives,  possibily 
letters.  The  output  of  the  slab  1  feature  detectors  are  then  the  input  to  slab  2  which  uses  the  same 
type  of  mechanism  to  discover  feature  detectors  for  higher  level  patterns,  possibly  letter  pairs  or 
word  fragments.  Similarly,  each  slab  in  the  multi-slab  system  uses  the  same  basic  design  to  discover 
features  in  the  input  from  the  lower  level  slab.  Success  in  self  organization  is  achieved  if  the  system 
reaches  equilibrum.  The  training  sets  are  used  to  allow'  the  system  to  self-organize  and  the  test 
sets  are  used  to  determine  if  the  system  has  indeed  equilibrated  on  features  of  the  Lenglish  source. 

Note  that  this  application  will  simultaneously  use  all  three  of  the  information  processing  prin¬ 
ciples  of  interest  to  this  study.  Hierachial  pattern  storage  will  be  used  to  code  primative  features 
of  the  pattern  (i.e.  letters)  in  the  lowest  level,  and  composite  features  such  as  letter  sequences  at 
higher  levels  in  the  hierarchv  Temporal  compression  will  be  used  to  reduce  the  time  variablity  of 
higher  level  patterns.  The  recognition  of  lime-varying  patterns  will  occur  as  the  avalanches  compete 
for  the  privilege  of  being  feature  detectors  for  spatiotemporal  segments  of  the  input  language. 

Results  of  these  experiments,  as  well  as  our  theoretical  results  aimed  at  deriving  “design  law's’^ 
that  can  quantitatively  predict  system  behavior  and  performance,  will  be  given  in  later  Interim 
Technical  Reports  and  in  the  Final  Report. 
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Table  3:  Word  Frequency  for  "Frog 


